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Acronym 
DUT 
GPU 
MBU 
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RTOS 
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Acronyms 


Definition 
Device Under Test 
Graphics Processing Unit 
Multi-Bit Upset 
NASA Electronic Parts and Packaging 
Parallel Thread Execution 
Real-time Operating System 
Single-Bit Upset 
Single Event Effect 
Single Event Functional Interrupt 
Single Event Upset 
Single Instruction Multiple Data 
System on Chip 
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Outline 


¢ GPU technology 

e The setup around the test setup 
e Parameter considerations 

e Lessons learned 
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asl Technology 


¢ Graphics Processing Units (GPU) & General 
Purpose Graphics Processing Units (GPGPU) 
— Are considered a compute device or coprocessor 
— Is not a standalone multiprocessor 


¢ Using high-level languages, GPU-accelerated 
applications run the sequential part of their 
workload on the CPU - which is optimized for 
single-threaded performance — while accelerating 
parallel processing on the GPU. 
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asl Purpose 


¢ GPUs are best used for single instruction- 
multiple data (SIMD) parallelism 


— Perfect for breaking apart a large data set into smaller 
pieces and processing those pieces in parallel 


e Key computation pieces of mission applications 
can be computed using this technique 
— Sensor and science instrument input 
— Object tracking and obstacle identification 
— Algorithm convergence (neural network) 
Image processing 
Data compression algorithms 
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Device Selection 


Unfortunately, GPUs come in multiple types, acting 
as primary processor (SoC) and coprocessor (GPU) 
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Nvidia GTX 1050 GPU i 


S Device Software 


¢ Does it need its own operating system? 
— E.g. Linux, Android, RTOS 


¢ Can we just push code at it? 
— E.g. Assembly, PTX, C 


e Payload normalization 


— Can we run the same code on the previous generation 
and next generation of the device? 


— Cannot with CUDA code; can with OpenCL 


Real-time Operating System (RTOS) 

Parallel Thread Execution (PTX) 

CUDA is a parallel computing platform and application 
programming interface model created by Nvidia 
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Payloads 


L Stable Fluids (512 x 512): 494.2 fps =/ 5 aaa) 


¢ Visual Simulations 
— Sample code 
— Fuzzy Donut (i.e. Furmark) 
e Sensor streams 
— Camera feed 
— Offline video feed 
¢ Computational loading 
— Scientific computing models 
Easy Math 
— 0+0... wait... should =0 
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S Test Setup 


¢ Things to consider in the test environment 
— Operating system daemons 
— Location of payload and results 
— Data paths upstream/downstream 

Control of electrical sources 

Temperature control (i.e. heaters) in a vacuum 


¢ Things to consider in the device under test (DUT) 
— Is the die accessible? 
— What functional blocks are accessible? 
— Which functions are independent of each other? 
— Does it have proprietary or open software? 
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S& Test Environment 


¢ Beam line 
— DUT testing zone where collateral damage can happen 
— Shielding for everything non-DUT 


e Operator Area 
— Cables, interconnects and extenders 
— Signal integrity at a distance 


— “Everything that was done in a lab, in front of you ona 
bench, now must be done from a distance...” 
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Test Environment (Cont’d) 
Arbiter Platform 


Hardware Info Gathering 
— Thermocouples 
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Test Environment (Cont'd) 
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Test Environment (Cont’d) 
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Ss DUT Health Status 


e Accessible nodes 
Network 
¢ Heart beat by inbound ping 
¢ Heart beat by timestamp upload 
Peripherals response 
¢ “Num lock” 
Visual check 
e Remote 
¢« Local 
¢ Local with remote viewing 
Electrical states 
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Monitoring Data 


Voltage Rails 
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Monitoring Data (Cont’d) 


¢ Significant digits are important 


¢ Resolution is needed for correlation 
— Faster sampling speed 
— Smaller units (uV or mV, not Volts) 
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Monitoring Data (Cont’d) 


¢ Even better (albeit being a mock up): 
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Failures 
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So Learning Experience 


— Every test is another learning experience 
¢ “Is the laser alignment jig in the beam path...” 


¢e Nuances with controllable nodes 
— DUT power switch 
— Remote power sources 
— DUT electrical isolation from test platform 
— Thermal paths 


¢ Improvements are always possible, but 
preparation time may not be as abundant 
¢ Prioritization during development is important 
— Software payload 


— Hardware monitoring 
— Remote troubleshooting capabilities 
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S& Conclusion 


— NEPP and its partners have conducted proton, neutron and 
heavy ion testing on several devices 


e Have captured SEUs (SBU & MBU), 
e Have seen traceable current spikes, 
e But predominately have encountered system-based SEFIs 


— GPU testing requires a complex platform to arbitrate the test 
vectors, monitor the DUT (in multiple ways) and record data 


e None of these should require the DUT itself to reliably 
perform a task outside of being exercised 


— Progress has been made in proving out multiple ways to 
simulate and enumerate activity on the DUT 


e Narrowing down on a universal test bench 
e End goal is to make test code platform independent 
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